A Straightforward Framework for Video Retrieval Using CLIP

نویسندگان

چکیده

Video Retrieval is a challenging task where the aims at matching text query to video or vice versa. Most of existing approaches for addressing such problem rely on annotations made by users. Although simple, this approach not always feasible in practice. In work, we explore application language-image model, CLIP, obtain representations without need said annotations. This model was explicitly trained learn common space images and can be compared. Using various techniques described document, extended its videos, obtaining state-of-the-art results MSR-VTT MSVD benchmarks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

EMD-Based Video Clip Retrieval by Many-to-Many Matching

This paper presents a new approach for video clip retrieval based on Earth Mover’s Distance (EMD). Instead of imposing one-to-one matching constraint as in [11, 14], our approach allows many-to-many matching methodology and is capable of tolerating errors due to video partitioning and various video editing effects. We formulate clip-based retrieval as a graph matching problem in two stages. In ...

متن کامل

Structuring Indexes for Video Clip

For the multimedia boom to amount to much, it must become easier for producers of such systems to find, manage, and organize large amounts of source materials in a variety of formats. Video materials are, in many ways, the most demanding component format in multimedia systems, being in effect multimedia presentations all on their own. Video clips combine moving images with sound, and may incorp...

متن کامل

A Unified Framework for Video Summarization, Browsing and Retrieval

Video content can be accessed by using either a top-down approach or a bottom-up approach [1, 2, 3, 4]. The top-down approach, i.e. video browsing, is useful when we need to get an “essence” of the content. The bottom-up approach, i.e. video retrieval, is useful when we know exactly what we are looking for in the content, as shown in Fig. 1. In video summarization, what “essence” the summary sh...

متن کامل

A probabilistic framework for semantic video indexing, filtering, and retrieval

Semantic filtering and retrieval of multimedia content is crucial for efficient use of the multimedia data repositories. Video query by semantic keywords is one of the most difficult problems in multimedia data retrieval. The difficulty lies in the mapping between low-level video representation and high-level semantics. We therefore formulate the multimedia content access problem as a multimedi...

متن کامل

A Videography Analysis Framework for Video Retrieval and Summarization

Overview: In this work, we focus on developing features and approaches to represent and analyze videography styles in unconstrained videos. By unconstrained videos, we mean typical consumer videos with significant content complexity and diverse editing artifacts, mostly with long duration. We present an approach for unsupervised videography analysis for unconstrained videos. Intuitively, each v...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2021

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-030-77004-4_1